skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Merchant, Nirav"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 31, 2026
  2. IntroductionEffective monitoring of insect-pests is vital for safeguarding agricultural yields and ensuring food security. Recent advances in computer vision and machine learning have opened up significant possibilities of automated persistent monitoring of insect-pests through reliable detection and counting of insects in setups such as yellow sticky traps. However, this task is fraught with complexities, encompassing challenges such as, laborious dataset annotation, recognizing small insect-pests in low-resolution or distant images, and the intricate variations across insect-pests life stages and species classes. MethodsTo tackle these obstacles, this work investigates combining two solutions, Hierarchical Transfer Learning (HTL) and Slicing-Aided Hyper Inference (SAHI), along with applying a detection model. HTL pioneers a multi-step knowledge transfer paradigm, harnessing intermediary in-domain datasets to facilitate model adaptation. Moreover, slicing-aided hyper inference subdivides images into overlapping patches, conducting independent object detection on each patch before merging outcomes for precise, comprehensive results. ResultsThe outcomes underscore the substantial improvement achievable in detection results by integrating a diverse and expansive in-domain dataset within the HTL method, complemented by the utilization of SAHI. DiscussionWe also present a hardware and software infrastructure for deploying such models for real-life applications. Our results can assist researchers and practitioners looking for solutions for insect-pest detection and quantification on yellow sticky traps. 
    more » « less
    Free, publicly-accessible full text available November 22, 2025
  3. Abstract Insect pests significantly impact global agricultural productivity and crop quality. Effective integrated pest management strategies require the identification of insects, including beneficial and harmful insects. Automated identification of insects under real-world conditions presents several challenges, including the need to handle intraspecies dissimilarity and interspecies similarity, life-cycle stages, camouflage, diverse imaging conditions, and variability in insect orientation. An end-to-end approach for training deep-learning models, InsectNet, is proposed to address these challenges. Our approach has the following key features: (i) uses a large dataset of insect images collected through citizen science along with label-free self-supervised learning to train a global model, (ii) fine-tuning this global model using smaller, expert-verified regional datasets to create a local insect identification model, (iii) which provides high prediction accuracy even for species with small sample sizes, (iv) is designed to enhance model trustworthiness, and (v) democratizes access through streamlined machine learning operations. This global-to-local model strategy offers a more scalable and economically viable solution for implementing advanced insect identification systems across diverse agricultural ecosystems. We report accurate identification (>96% accuracy) of numerous agriculturally and ecologically relevant insect species, including pollinators, parasitoids, predators, and harmful insects. InsectNet provides fine-grained insect species identification, works effectively in challenging backgrounds, and avoids making predictions when uncertain, increasing its utility and trustworthiness. The model and associated workflows are available through a web-based portal accessible through a computer or mobile device. We envision InsectNet to complement existing approaches, and be part of a growing suite of AI technologies for addressing agricultural challenges. 
    more » « less
  4. Abstract Molecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. Ideally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. Here, we introduce MDRepo, a robust infrastructure that provides a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyber-infrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data. 
    more » « less
  5. BackgroundMolecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. This enables advances in drug discovery and the design of therapeutic interventions. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. A needIdeally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. A solutionHere, we introduce MDRepo, a robust infrastructure that supports a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyberinfrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data. 
    more » « less
  6. As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to ( i ) improve data processing efficiency; ( ii ) provide an extensible, reproducible computing framework; and ( iii ) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area). 
    more » « less
  7. null (Ed.)